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THE BEHAVIOR OF THE NPMLE OF A DECREASING DENSITY 
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Eurandom and Delft University of Technology 

We investigate the behavior of the nonparametric maximum like- 
lihood estimator /„ for a decreasing density / near the boundaries of 
the support of /. We establish the limiting distribution of fn(n~ a ), 
where we need to distinguish between different values of < a < 1. 
Similar results are obtained for the upper endpoint of the support, 
in the case it is finite. This yields consistent estimators for the values 
of / at the boundaries of the support. The limit distribution of these 
estimators is established and their performance is compared with the 
penalized maximum likelihood estimator. 

1. Introduction. In various statistical models, such as density estima- 
tion and estimation of regression curves or hazard rates, monotonicity con- 
straints can arise naturally. For these situations certain isotonic estimators 
have been in use for considerable time. Often these estimators can be seen 
as maximum likelihood estimators in a semiparametric setting. Although 
conceptually these estimators have great appeal and are easy to formulate, 
their distributional properties are usually of a very complicated nature. 

In the context of density estimation, the nonparametric maximum likeli- 
hood estimator /„ for a nonincreasing density / on [0, oo) was studied by 
Grenander [2]. It is defined as the left derivative of the least concave ma- 
jorant (LCM) of the empirical distribution function F n constructed from 
a sample from /. Prakasa Rao [11] obtained the asymptotic pointwise be- 
havior of f n . Groeneboom [3] provided an elegant proof of the same result, 
which can be formulated as follows. For each xq > 0, 

(1.1) |4/(xo)/'(xo)r 1/3 7^ 1/3 {/ ra (x ) - f(xo)} - argmax{W(t) - t 2 } 
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in distribution, where W denotes standard two-sided Brownian motion orig- 
inating from zero. The first distributional result for a global measure of 
deviation for f n was found by Groeneboom [3], concerning asymptotic nor- 
mality of the Li-distance \\f n — f\\i (see [4] for a rigorous proof). 

Apart from estimating a monotone density / on (0,oo), the estimation 
of the value of / or its derivatives at zero is required in various statistical 
applications. There is a direct connection with renewal processes, where the 
backward recurrence time in equilibrium has density f(x) = (1 — G(x))/n, 
where G and fi are the distribution function and mean of the interarrival 
times (see [1]). Clearly, / is decreasing and a natural parameter of interest is 
[i = 1/7(0). An interesting application is in the context of natural fecundity 
of human populations, where one is interested in the time T it takes for a 
couple from initiating attempts to become pregnant until conception occurs. 
Keiding, Kvist, Hartvig and Tvede [6] investigated a current-duration design 
where data are collected from a cross-sectional sample of couples that are 
currently attempting to become pregnant. If U is the time to discontinua- 
tion without success and V is the time to discontinuation of follow-up, then 
X = T A U is the waiting time until termination for whatever reason, and 
Y = T f\U f\V is the observed experience waiting time. When the initia- 
tions happen according to a homogeneous Poisson process, Y is distributed 
as the backward recurrence time in a renewal process in equilibrium, and 
the survival function of X is /(x)//(0), where / is decreasing. Woodroofe 
and Sun [13] provide a different application in the context of astronomy. 
If Y denotes the normalized angular diameter of a galaxy, conditional on 
that it is being observed, then 1/Y 3 has a nonincreasing density / and the 
proportion of galaxies that are observed is l//(0). Another example is from 
Hampel [5], who studied the sojourn time of migrating birds. Under certain 
model assumptions, the expected sojourn time is — /(0)//'(0), where / is the 
(convex) decreasing density of the time span between capture and recapture 
of a bird. 

In contrast to (1.1), Woodroofe and Sun [13] showed that f n is not con- 
sistent at zero. They proposed a penalized maximum likelihood estimator 
/P(0) and in [12] it was shown that 

„V3 {/ r (0) _ /(0)} _ sup ^)-fr-WWW | 

t>0 t 

where c depends on the penalization. Surprisingly, the inconsistency of f n 
at zero does not influence the behavior of ||/ n — /||i- Nevertheless, the in- 
consistency at the boundaries will have an effect if one studies other global 
measures of deviation, such as the L^-distance, for k larger than 1, or the 
supremum distance. 
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In this paper we study the behavior of the Grenander estimator at the 
boundaries of the support of /. We first consider a nonincreasing density / 
on [0,oo) and investigate the behavior of 

(1.2) n^Ucn-^-ficn^)} 

for c > 0, where < a < 1 and (3 > are chosen suitably in order to make (1.2) 
converge in distribution. Our results will imply that when /'(O) < 0, then 
/ n (cn _1//3 ) is a consistent estimator for /(0) at rate n 1 / 3 with a limiting 
distribution that is a functional of W . This immediately yields /^(0) = 
/ n (n -1 / 3 ) as a simple estimator for /(0). A more adaptive alternative would 
be to find the value of c that minimizes the asymptotic mean squared error. 
This turns out to depend on / and then has to be estimated. The resulting 
estimator /^(0) = f n {cn~ 1 ^) will be compared with the penalized maximum 
likelihood estimator from [12]. We will also consider the case where f'(0) = 
and /"(0) < 0, which requires different values for c and a. For nonincreasing 
/ with compact support, say [0,1], we also investigate the behavior near 1. 
Similarly, this leads to a consistent estimator for /(l). Moreover, the results 
on the behavior of f n at the boundaries of [0, 1] allow an adequate treat- 
ment of the Lfc-distance between f n and /. It turns out that for k > 2.5, the 
inconsistency of f n starts to affect the behavior of ||/ n — f\\f. (see [10]). 

In Section 2 we give a brief outline of our approach for studying differences 
such as (1.2) and state some preliminary results for the argmax functional. 
Section 3 is devoted to the behavior of f n near zero. Section 4 deals with 
the behavior of f n near the boundary at the other end of the support for 
a density / on [0,1]. In Section 5 we propose two estimators /^(0) and 
/ A (0) based on the presented theory, and compare these with the penalized 
maximum likelihood estimator from Sun and Woodroofe [12]. 

2. Preliminaries. Instead of studying the process {f n (t) '■ t > 0} itself, we 
will use the more tractable inverse process {U n (a) :a > 0}, where U n (a) is 
defined as the last time that the process F n (t) — at attains its maximum, 

U n (a) = argmax{F n (t) — at}. 

te[o,oo) 

Its relation with f n is as follows: with probability 1 
(2.1) fn{x)<a <=> U n (a)<x. 

Let us first describe the line of reasoning used to prove convergence in dis- 
tribution of (1.2). We illustrate things for the case c = l, 0<a<l/3, and 
/'(0) < 0. It turns out that in this case the proper choice for (5 is 1/3. Hence, 
we will consider events of the type 

n^{Un- a )-f(n- a )}<x. 
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According to relation (2.1), this event is equivalent to 

U n (f(n- a ) + xn- 1 / 3 ) - n~ a < 0. 

The left-hand side is the arg max of the process 

Z n (t) = F n (t + n~ a ) - f(n- a )t - xtn' 1 ' 3 . 

With suitable scaling, the process Z n converges in distribution to some Gaus- 
sian process Z . The next step is to use an arg max version of the continuous 
mapping theorem from [7]. The version that suffices for our purposes is 
stated below for further reference. 

Theorem 2.1. Let {Z(t) :t € M} be a continuous random process satis- 
fying: 

(i) Z has a unique maximum with probability 1. 

(ii) Z{t) — ► — oo ; as \t\ — > oo, with probability 1. 

Let {Z n (t) :t 6 M} be a sequence of random processes satisfying: 

(hi) argmaxjgK Z n (t) = O p (l), as n — > oo. 

If Z n converges in distribution to the process Z in the topology of uniform 
convergence on compacta, then argmaxtgjR Z n (t) converges in distribution to 
argmax feIR Z(t). 

This theorem yields that U n (f(n~ a ) +xn~ 1 ^ 3 ), properly scaled, converges 
in distribution to the argmax of a Gaussian process. Convergence of (1.2) 
then follows from another application of (2.1). 

The main difficulty in verifying the conditions of Theorem 2.1 is showing 
that (hi) holds. It requires careful handling of all small order terms in the 
expansion of the process. In the process of proving condition (hi) we will 
frequently use the following lemma, which enables us to suitably bound the 
argmax from above. 

Lemma 2.1. Let f and g be continuous functions on if C 1. 

(i) Suppose that g is nonincreasing. Then argmax^g^j/^x) + g(x)} < 
argmaxzetf f(x). 

(ii) Let C > and suppose that for all s,t £ K , such that t>C + s, we 
have that g(t) <g(s). Then a^max^^ {/(re) + g(x)} < C + argmax xg i<-/(x). 

In studying processes like Z n we will use a Brownian approximation sim- 
ilar to the one used in [4]. Let E n denote the empirical process y/n{F n — F). 
For n > 1, let B n be versions of the Brownian bridge constructed on the 
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same probability space as the uniform empirical process E n o F~ l via the 
Hungarian embedding, where 

(2.2) sup|£ n (t) -B n (F(t))\ =O p {n~ 1 / 2 \ogn) 
(see [8]). Define versions W n of Brownian motion by 

W n (t) = B n (t)+£ n t, te[0,l], 

where £ n is a standard normal random variable independent of B n . This 
means that we can represent B n by the pathwise equality B n {t) = W n {t) — 
tW n (l). 

We will often apply a Brownian scaling argument in connection with 
argmax functionals. Note that argmaxt{Z(t)} does not change by multiply- 
ing Z by a constant, and that the process W(bt) has the same distribution 
as the process b l l 2 W(t). This implies that 

aargmax{W(bt) - ct k } = aigmax{W(ba~ 1 t) - ca~ k t k } 
tei teal 

(2.3) = argmax{fe 1 / 2 a- 1 / 2 W(t) - ca~ k t k } 

teal 

= argmax{H/(t) - dr^ar 1 * 1 ! 2 *} 

teal 

for / ci and constants a, b > and c£l. 

3. Behavior near zero. We first consider the case that / is a nonincreas- 
ing density on [0, oo) satisfying: 

(CI) 0</(0)=lim xi o/(x)<oo. 

(C2) For some k > 1, < |/ (fc) (0)| < sup s > |/ (fe) («)l < oo, with /W(0) = 
lim xlo f( k \x), and /W(0) = for 1 < i < k - 1. 

Under these conditions we determine the behavior of the Grenander estima- 
tor near zero. With the proper normalizing constants the limit distribution 
of nP(f n (n~ a ) — f(n~ a )) is independent of /. Define D[Z(t)](a) as the right 
derivative of the LCM on M of the process Z(t) at the point t = a, and define 
Dr similarly, where the LCM is restricted to the set t > 0. 

Theorem 3.1. Suppose f satisfies conditions (CI) and (C2) and let 
c> 0. Then: 

(i) For l/(2fe + 1) < a < 1 and Ai = (c//^)) 1 / 2 , i/ie sequence 
A 1 n( 1 - a V 2 (f n (cn,- a )-f(cn- a )) 
converges in distribution to D-&\W(t)](l) as n — > oo. 
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(ii) For a = 1 /(2* + 1), B 2k = {f{$) 1,2 \f {k) (ti)\- l {k + l)!) 2 /( 2fc+1 ) and 
A 2 k = \/B 2k /f(0), the sequence 

A 2k {n^(f n (cB 2k n-") - f{cB 2k n-)) + ^^X^ 



converges in distribution to D^[W(t) — t k+1 ](c) as n — > oo. 

(iii) ForO<a< l/(2fc + l) andA 3fe = (2(jfe-l)!) 1 /3|/(0)/(*)(0)c fc - 1 |- 1 / 3 J 
i/ie sequence 

j4aferl l/3+a(fe-l)/3 ( ; n(cn -a ) _ /( crj -<*)) 

converges in distribution to D[W(t) - t 2 ](0) as n — ► oo . 

Remark 3.1. In order to present the limiting distributions in Theo- 
rem 3.1 in the same way, they have been expressed in terms of slopes of 
least concave majorants. However, note that similar to the switching rela- 
tion (2.1), one finds that 



Dn[W(t)](l) = /argmax{W(t) -t}, 
V t£[0,oo) 

D[W(t) -t 2 }(0) =2argmax{lF(t) -t 2 }. 

In studying the behavior of (1.2), we follow the line of reasoning described 
in Section 2. We start by establishing convergence in distribution of the 
relevant processes. It turns out that we have to distinguish between three 
cases concerning the rate at which n~ a tends to zero. 

Lemma 3.1. Suppose f satisfies (CI) and (C2) and let W denote stan- 
dard two-sided Brownian motion on R. For 1/(2* + 1) < a < 1, t>0 and 
x£K, define 

Z nl (x,t) = n^/ 2 {F n (tn- a ) - f(0)tn~ a ) - xt. 

(i) For l/(2k + 1) < a < \, the process {Z n i(x,t) :t G [0,oo)} converges 
in distribution in the uniform topology on compacta to the process {W(f(0)t) — 
xt:te [0,oo)}. 

(ii) For a = 1/(2* + 1), the process {Z n i(x,t) :t € [0, oo)} converges in 
distribution in the uniform topology on compacta to {W(f(0)t) — xt + f( k \o)t k+1 / 
(k + l)\:t £ [0,oo)}. 

(iii) For0<a< l/(2fc + l), b= (l-2a(Jfe-l))/3, t>-cn b ~ a andxeR, 
define 

Z n2 {x,t) =n {b+1)/2 (F n {cn- a + tn~ b ) - F n (cn- a ) - f(cn- a )t n - b ) - xt. 
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Then the process {Z n 2(x, t):t € [— cn b ~ a , oo)} converges in distribution in the 
uniform topology on compacta to the process {W(f(0)t) — xt + c fc ~ 1 /( fc )(0)£ 2 / 
(2(fc-l)!):t£l}. 

The next step is to use Theorem 2.1. The major difficulty is to verify con- 
dition (iii) of this theorem. The following lemma ensures that this condition 
is satisfied. 

Lemma 3.2. Let f satisfy (CI) and (C2) and let Z n \, Z n 2 and b be 
defined as in Lemma 3.1. 

(i) For l/(2k + 1) < a < 1 and x > 0, argmax tG j ,oo) Z n i(x, t) = O p (l). 

(ii) For a = l/(2k + 1) and x£R, argmax tg [ oo - ) Z n i(x, t) = O p (l). 

(iii) For < a < l/(2k + 1) and x£K, argmax (e [_ cn i,- a ^ Z n 2(x,t) = 
O p (l). 

With Lemmas 3.1 and 3.2 at hand, the proof of Theorem 3.1 consists of 
using the switching relation (2.1) and an application of Theorem 2.1. 

PROOF of Theorem 3.1. (i) First note that by condition (C2), 

n (1 - a)/2 (/ n (cn-°) - f(cn- a )) = n^ 2 (/ n (cn" Q ) - /(0)) 

+ 0(n (i-(2fc+i)a)/ 2)) 

where (1 - (2k + l)a)/2 < 0. For x > 0, according to (2.1), 

p{ n (l-)/2(/ n(cn -«)_ /(0))<X> 

(3.1) 

= P{n a U n (f(0) + xn- (1 ~ a)/2 ) < c}. 
If Z n i is the process defined in Lemma 3.2(i), then 

(3.2) < n a U n {f{0) + xn- (1 - a)/2 ) = arg max Z nl (x, t) = O p (l), 

te[o,oo) 

where, according to Lemma 3.1, the process {Z n i(x, t):t G [0, oo)} converges 
in distribution to the process {W(f(0)t) — xt:t £ [0, oo)}. To apply The- 
orem 2.1, we have to extend the above processes to the whole real line. 
Therefore define 



Znlit) 



Z nl (x,t), t>0, 
t, t<0. 



Then for x fixed, Z n \ converges in distribution to the process Z\, where 
Z x (t) 



(W(f(0)t)-xt, t>0, 
\t, t<0. 
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Moreover, since Z n i(x,0) = 0, together with (3.2), it follows that 
arg max Z n \ (t) = arg max Z n \ (t) 

teR *6[0,oo) 

= n a U n (f(0) + xtn-^l 2 ) = O p (l). 

The process Z\ is continuous, and since Var(Zi(s) — Zi(t)) ^ for s,t > 
with s ^ t, it follows from Lemma 2.6 in [7] that Z\ has a unique maximum 
with probability 1. By an application of the law of the iterated logarithm 
for Brownian motion, 

(3.3) p(limsup-^ Wi " U ] , F = ll = l, 

I |«Hoo V 2 \ u \ log log | u | J 

it can be seen that Z\(t) — > — oo as \t\ — > oo. Theorem 2.1 now yields that 
argmaxigR Z n i(t) converges in distribution to 

argmaxZi(i) = argmax{M /r (/(0)t) — xt}. 

teiR t>o 

Using (3.1) together with (2.3), this implies that 

P{n^l\f n {cn- a )-fm<x} 



P< argmaxZ n i(t) < c> 



-» argmax{W(/(0)t) - xt} < c 

= p hr{ w(<l "^} sl }' 

Similar to the switching relation (2.1), the right-hand side equals 

P{(f(U)/c) l l 2 D K [W(t))(l)<x}, 

so that it remains to show that P{n( 1 - Q )/ 2 (/ n (cn _a ) - /(0)) < 0} -> 0. But 
this is evident, as for any e > 0, using (2.3) once more, 

p {n d-«)/2 ( / n(m -« ) _ /(0)) < 0} 

— > plargmaxlw(t) , £ ; . 1 < cl 



Wargmax{J^(i) - i} < 
I t>o 



ce 2 



/(0) 
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When e | 0, the right-hand side tends to zero, which can be seen from 
p/limsup— W ® , = = = ll = 1. 



This proves (i). 

(ii) First note that by (C2), 

^(/nlcBan-W)) - /(cBan-V^))) + /C*)( )£^£ 

= n fc /( 2fc+1 )(/ n (c J B 2fc n- 1 /(^D) _ /(0)) + o(l), 

and that according to (2.1), P{n fc /( 2fc+1 )(/ n (cS 2fc n- 1 /( 2fc+1 )) - /(0)) < x} is 
equal to 

P{ J B 2fc 1 n 1 /( 2fc + 1 )[/ n (/(0) + xn- fe /(2 fe+1 )) < c}. 

With Z n i being the process defined in Lemma 3.1 with a = l/(2k + 1), we 
get 

£-i n i/(2fe + i )[/n(/(0) + ^-fc/C^+l)) = argmax {Z nl (x,i? 2 ,t)} = O p (l). 

te[o,oo) 



Again we first extend the above process to the whole real line: 

Znl(t) 



Z n i(x,B 2 kt), t>0, 
t, t<0. 

Then, according to Lemma 3.1, Z n \ converges in distribution to the process 

{ W(f(0)B 2k t) - B 2k xt + f( k \0)B^H k+1 /(k + 1)!, t > 0, 
\t, t<0. 



Z2{t) 



Similar to the proof of (i), it follows from Theorem 2.1 that arg maxt Z n \ (t) 
converges in distribution to arg max< Z 2 (t) . This implies that 

P{A 2k n k ^ 2k+1 \f n (cB 2k n^ k ^) - /(0)) < x} 

= p{argmax{VF(t) - xt - t k+1 ) <c\ 
I t>o ) 

= P{D R [W(t)-t k+1 ](c)<x}, 

by means of Brownian scaling similar to (2.3), and a switching relation 
similar to (2.1). 
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(iii) According to (2.1), we have 

P{n^l\f n (cn~ a ) - f(cn~ a )) < x} 

(3.4) 

= P{n b {U n {f(cn- a ) + xn-f 1 " 6 )/ 2 ) - cn~ a ) < 0}, 
and with Z n2 as defined in Lemma 3.2 (iii) , we get 

n b {U n (f(cn- a )+xn^ 1 ~ b ^ 2 )-cn- a )= argmax Z n2 (x,t) = O p (l). 

te[-cn b - a ,oo) 

As in the proof of (i) and (ii), we extend the above process to the whole real 
line: 



Z n2 (t) 



Z n 2(x,t), t>-cn 



b—a 



Z n2 (x, -cn b - a ) + (t + cn b ~ a ), t < -cn b - a . 
Then by Lemma 3.1 Z n2 converges in distribution to the process Z3, where 

Zs(t) = W(f(0)t) -xt + l^^t 2 , t G R. 

Similar to the proofs of (i) and (ii), it follows from Theorem 2.1 that 
argmaxj Z n2 (t) converges in distribution to argmax^ Z^(t). Together with 
(3.4), this implies that 

P{n^/ 2 A 3k (f n (cn~ a ) - f(cn- a )) < x} 

- P{aigmax{ W(f(0)t) - A^xt + l^^ t 2 ^ < o} 

= p{argmax{VF(t) -xt-t 2 } <o\ 
I teM. J 

= P{D[W{t) - t 2 }(0) < x}, 

again using Brownian scaling similar to (2.3), and a switching relation similar 
to (2.1). □ 

4. Behavior near the end of the support. Suppose that / has compact 
support and, without loss of generality, assume this to be the interval [0, 1]. 
In this section we investigate the behavior of /„ near 1. Although there 
seems to be no simple symmetry argument to derive the behavior near 1 
from the results in Section 3, the arguments to obtain the behavior of 

„0 {/ (l_ n -«)_/ n (l _„-«)} 

are similar to the ones used in studying (1.2). If /(l) > 0, then f n (l) will 
always underestimate /(l), since by definition / n (l) = 0. Nevertheless, the 
behavior near the end of the support is similar to the behavior near zero. 
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For this reason, we only provide the statement of a theorem for the end 
of the support, which is analogous to Theorem 3.1. For details on the proof 
we refer to [9]. Motivations for studying the behavior near the end of the 
support are not so strong as for the behavior near zero. However, the be- 
havior near 1 is required for establishing the asymptotic normality of the 
Lfc-distance between f n and /. Similar to (CI) and (C2) we will assume 
that: 

(C3) 0</(l) = lun SBT i/(x)<oo. 

(C4) For some k > 1, < |/ (fe) (!)l < s up < s <i \f {k \s)\ < oo, with f( k \l) = 
lim^i /(*)(&) and = Oforl<t<Jfe-l. 

We then have the following theorem. 

Theorem 4.1. Suppose f satisfies conditions (C3) and (C4) andc>0. 
Then: 

(i) For l/(2Jfe + 1) < a < 1 and A 1 = (c/ / (1)) 1/2 , the sequence 

i in (l-a)/2 (/(1 _ cn -a) _ ^ _ cn ~ a)) 

converges in distribution to D-^\W{t)]{\) as n — > oo. 

(ii) For a = l/(2k + l), ^ 2 fc = (/(l) 1/2 |/ (fc) ( 1 )r 1 ((^ + l)!)) 2 /( 2fc+1 ) and 



A 2 k = yB 2k /f(l), the sequence 

A 2k {n^(f(l - cB 2k n«) - f n (l - cB 2k n^)) - LffM^ j 

converges in distribution to D^[W(t) — t k+1 ](c) as n — > oo. 

(iii) ForO<a< l/(2fc + l) and A 3k = ((jfe- l)!) 1 / 3 |4/(l)/W(l)c fc - 1 |- 1 / 3 , 
i/ie sequence 

i 3fc n 1 /3+«('=-i)/3 (/(1 _ cn -) _ / n(1 _ m -«)) 
converges in distribution to D[W(t) - t 2 ](0) as n — > oo . 

Proof. The proof is similar to that of Theorem 3.1. We briefly sketch 
the proof for case (i); details can be found in [9]. 

Similar to the proof of Theorem 3.1(i), it suffices to consider 

n (l-a)/2 (/(1) _; n(1 _ cn - a)) _ 
For x > 0, according to (2.1), 

p {n (l-a)/2 (/(1) _ _ cn - a)) < x} 

(4.1) 

= P{n a (l - U n (f(l) - xn-^l 2 )) < c}. 
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We have that n a (l — U n (f(l) — xn~^ l ~ a ^ 2 )) = argmax t6 [ 0>n a] Y n i(x,t), 
where the process 

Y nl (x,t) = n( 1+Q )/ 2 (F n (l - tn~ a ) - F n (l) + /(l)tn" a ) - xt 

converges in distribution to the process {W(f(l)t) — xt:t 6 [0,oo)}. Prom 
here on, the proof proceeds in completely the same manner as that of The- 
orem 3. 1 (i) . We conclude that for x > 0, 

P{n( 1 - Q )/ 2 (/(l) - f n (l-cn- a ))<x} 



P< argmaxY nl (t) < c > 

I 0<t<n°> J 



-> P\ argmax{W(/(l)t) - xt} < c 
I t>o 

= p(argmax(^)- myT7 ,}<l 

By (2.1), the right-hand side equals P{(/(l)/c) 1 / 2 L> R [W(t)](l) < x}, and 
similar to the proof of Theorem 3.1 (i) it follows that P{rS l ~ a ^ 2 {f(l) — 
f n (l- cn~ a )) < 0} -> 0. This proves (i). □ 

5. A comparison with the penalized NPMLE. Consider a decreasing 
density / on [0, oo). We first consider the case where /'(0) < 0. As pointed 
out in [13], the NPMLE f n for / is not consistent at zero. They proposed 
a penalized NPMLE f^(a n ,0), and in Sun and Woodroofe [12] they show 
that 

1/3/ f*7 m f(n\\ W(t)-(c-(l/2)f(0)f'(0)t 2 ) 
n ' Un(an,0) -/(O)j^sup , 

t>0 t 

where c is related to the smoothing parameter a n = cn~ 2 ^. Sun and Woodroofe 
[12] also provide (to some extent) an adaptive choice for c that leads to an 
estimate a n of the smoothing parameter, and report some results of a sim- 
ulation experiment for f^(a n ,0). 

We propose two consistent estimators of /(0), both converging at rate 
n 1 / 3 . A simple estimator is /^(0) = f n ( n ~ 1 ^)- This estimator is straightfor- 
ward and does not have any additional smoothing parameters. According to 
Theorem 3.1(h), /^(0) is a consistent estimator for /(0), converging at rate 
ra 1 / 3 . It has a limiting distribution that is a functional of W, 

A 21 n 1 / 3 {/ n s (0) - /(0)} - D R [W(t) - t\l/B 21 ), 

where A 2 \ and B 2 \ are defined in Theorem 3.1(h). In order to reduce the 
mean squared error, we also propose an adaptive estimator 

f^) = L{c\B 2l n~^) 
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for /(0). Here c* k is the value that minimizes E(D R [W(t) - t h+1 ](c)) 2 , and 
B21 is an estimate for the constant B21 in Theorem 3.1 (ii) . Computer sim- 
ulations show that c£ ~ 0.345 for both k = 1 and k = 2. We take 

^21=4 1 /3/S (0) l/3 |/4(0)r 2/3 ) 

where 

7^(0) =min(n 1 / 6 (/ n (n- 1 / 6 ) - 7n(n" 1 / 3 )), -n" 1 / 3 ) 

is an estimate for /'(0). As we have seen above, /„(0) is consistent for 
/(0), and according to Theorem 3.1, /^(0) is consistent for /'(0). When / 
is twice continuously differentiate, it converges at rate n 1 / 6 . Therefore B21 
is consistent for B21 and 7^(0) is a consistent estimator of /(0), converging 
with rate n 1 / 3 . It has the limit behavior 

^2in 1 / 3 {7 n A (0) - /(0)} - D R [W(i) - t 2 M), 

where ^21 is defined in Theorem 3.1(h). 

We simulated 10,000 samples of sizes n = 50, 100, 200 and 10,000 from 
a standard exponential distribution with mean 1. For each sample, the values 
of nV3{$(0) - /(0)}, nV 3 {7^(0) - /(0)} and nV3{/P(a n ,0) - /(0)} were 
computed. The value of a n was computed as proposed in [12], a n = 0.649 • 

/Jn^n -2 / 3 , where 

A J iP f n ^ 7n (QQ,0) ~ fn( a 0, x m) _,\ 

/3 n = max / n (a ,0) ,n q \ 

L zx m J 

is an estimate of /? = — /(0)/'(0)/2. Here x m denotes the second point of 
jump of the penalized NPMLE f^(ao, •) computed with smoothing param- 
eter «o- The parameter ao = con" 2 / 3 , and q should be taken between and 
0.5. However, Sun and Woodroofe [12] do not specify how to choose q and 
Co in general. We took q = 1/3, and for ao the values as listed in their Table 
2: q = 0.0516, 0.0325 and 0.0205 for sample sizes n = 50, 100 and 200. 
For sample size n = 10,000 we took the theoretical optimal value ao = 
0.649/3 -1//3 n -2 / 3 , with [5 = 0.5. It is worth noticing that Sun and Woodroofe 
[12] do not optimize the MSE, but n 1/3 £|7^(an, 0) - /(0)|. Nevertheless, 
computer simulations show that the a n minimizing the MSE is approxi- 
mately the same and that n 2 ^ 3 E\f^(a, 0) — /(0)| 2 is a very flat function in 
a neighborhood of a n . A similar property holds for the value c£ minimizing 
the AMSE of our estimator. 

In Table 1 we list simulated values for the mean, variance and mean 
squared error of the three estimators. The penalized NPMLE is less biased, 
but has a larger variance. Estimator / A (0) performs better in the sense 
of mean squared error, approaching the best theoretically expected perfor- 
mance. It is also remarkable how well it mimics its limiting distribution for 
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small samples. Estimator /^(0) performs a little worse than /^(O), having 
the largest bias, but the smallest variance. 

If k = 2 in condition (C2), it is possible to estimate /(0) at a rate faster 
than n 1 / 3 . If it is known in advance that k = 2,we can produce two consistent 
estimators of /(0) converging at rate n 2 / 5 . Similar to the previous case, a 
simple estimator is /^' 2 (0) = f n (n _1 / 5 ). It is a consistent estimator of /(0), 
converging at rate ra 2//5 , and has the limit behavior 

A 22 n 2 / 5 {/ n s > 2 (0) - 7(0)} - - t 3 ](i/i? 22 ), 

where A 22 and -B 22 are defined in Theorem 3.1(h). Again, we propose an 
adaptive estimator /,^' 2 (0) = fn{c 2 B 22 n~ 1 /^) for /(0), where B 22 is an esti- 
mate for the constant B 22 = 36 1 / 5 /(0) 1 / 5 | /"(0) | ~ 2 / 5 in Theorem 3.1(h), and 
c* 2 « 0.345 is the value that minimizes E(D R [W(t) - t 3 ](c)) 2 . We take S 22 = 
36 1 / 5 ,^' 2 (0) V5 J /jj (o) J -2/5 , where we estimate f"{0) by /*(0) =min(2n 1 / 4 x 
(/ n (n _1//8 ) — / n (n -1 / 5 )), — n -1 / 5 ). As we have seen above, /^ ,2 (0) is consis- 
tent for /(0), and according to Theorem 3.1, /4'(0) is consistent for f"(0) 
with rate ra 1 / 8 if / is three times continuously differentiable. Therefore B 22 
is a consistent estimator for B 22 and / A,2 (0) is a consistent estimator of 
/(0), converging with rate n 2 / 5 : 

A 22 n 2/5 {/ n A ' 2 (0) - /(0)} - D R [W(*) - t 3 ](c* 2 ), 

where A 22 is defined in Theorem 3.1(h). 

We simulated 10,000 samples of sizes n = 50, 100, 200 and 10,000 from a 
half-normal distribution. For each sample, the values of n 2 ^ 5 {f^' 2 (0) — /(0)} 
and re 2 ^ 5 {/^' 2 (0) — /(0)} were computed. Sun and Woodroofe [12] do not 

Table 1 

Simulated mean, variance and mean squared error for the three 
estimators at the standard exponential distribution 



n 









50 


100 


200 


10,000 






Mean 


-0.847 


-0.853 


-0.868 


-0.917 


n 1/3 {/„ S (0) 


-/(o)} 


Var 


0.439 


0.484 


0.536 


0.700 






MSE 


1.157 


1.211 


1.289 


1.541 


n 1/3 {/n A (0) 




Mean 


-0.738 


-0.777 


-0.793 


-0.643 


-f(o)} 


Var 


0.934 


0.742 


0.807 


1.045 






MSE 


1.478 


1.345 


1.436 


1.458 


n 1/3 {/„ P (a„ 




Mean 


-0.072 


-0.079 


-0.075 


-0.195 


,0)-/(0)} 


Var 


1.296 


1.530 


1.732 


1.913 






MSE 


1.301 


1.537 


1.738 


1.951 
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consider the possibility of constructing a special estimator for the case k = 2, 
though we believe that this is also possible with a penalization technique. In 
Table 2 we list simulated values for the mean, variance and mean squared 
error of both estimators. The simple estimator is more biased but its variance 
is smaller than the variance of the adaptive one. 

If it is not known in advance that k = 2, then application of estimators 
/^' 2 (0) and /^' 2 (0) is undesirable. If in fact k = 1, they are still consistent, 
but their convergence rate will be n 1 ^ 5 . On the other hand, when k = 2, then 
/n(0) ; fn(®) an d /n(^n,0) are still applicable. In that case, according to 
Theorem 3.1 (i), /*j(0) is a consistent estimator of /(0) converging at rate 
n 1 / 3 , such that 

n 1/3 {f S n (0) ~ /(0)} - y[mihL\W®m 

Also / A (0) is still consistent for /(0) in case k = 2, but now at rate ?i 7 / 18 . 
This can be seen as follows. Since f'(0) = 0, it follows that 

n 1/6 /;(0) - -^ff(0)D R [W(tm + 

As /S(0) = /(0) + O p (n- 1 /3) ) this implies that B 21 n~ 1 / 3 = Op(n" 2 / 9 ). Ap- 
plication of Theorem 3.1 (i) yields that / A (0) = /(0) + O p (n- 7 / 18 ). Sun and 
Woodroofe [12] also propose to use f^(a n ,0) as an estimate of /(0) in the 
case k>2. They prove that in that case n 1 / 3 {/^ > (a n , 0) — /(0)} — > [see 
their Theorem 1(h) on page 146]. 

We simulated 10,000 samples of sizes n = 50, 100, 200 and 10,000 from 

a standard half-normal distribution. For each sample the values were com- 
putedofnV3{/S(0)-/(0)},nV3{/A (0) _ /(0)}andn i/3 {/ ^ (dn)0) _ /(0)} . 

In Table 3 we list simulated values for the mean, variance and mean squared 
error of the three estimators. The simple estimator has the smallest vari- 
ance, but as the sample size increases it becomes more biased. Nevertheless, 

Table 2 



Simulated mean, 


variance 


and mean 


squared 


error for 


both 


estimators at the half-normal distribution 










n 








50 


100 


200 


10,000 




Mean 


-0.429 


-0.437 


-0.440 


-0.419 


n 2/5 {/f' 2 (0)-/(0)} 


Var 


0.371 


0.402 


0.440 


0.559 




MSE 


0.555 


0.592 


0.634 


0.735 


n 2/5 {/ A ' 2 (0)-/(0)} 


Mean 


-0.252 


-0.278 


-0.373 


-0.326 


Var 


0.459 


0.502 


0.549 


0.747 




MSE 


0.523 


0.579 


0.688 


0.853 
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it is stable for small sample sizes. The adaptive estimator becomes more bi- 
ased with growing sample size, but with smaller MSE. The penalized MLE is 
most biased, also having a much larger variance than its simple and adaptive 
competitors. 

Finally, in Table 4 we list the true limiting values for the mean, variance 
and MSE, for all estimators at the exponential and half-normal distribu- 
tions. The finite sample behavior of the simple estimators /^(0) (see Tables 
1 and 3) and i^' 2 (0) (see Table 2) reasonably matches the theoretical behav- 
ior. The adaptive estimators exhibit larger deviations from their theoretical 
values. This is probably explained by the fact that even for larger sample 
sizes, the estimation of the derivatives of / in Bik still has a large influence. 

One might prefer a scale-equivariant version of the above estimators. One 
possibility is f n (X m:n ), where X m:n denotes the mth order statistic. The 
sequence m = m{n) should be chosen in such a way that m(n) — > oo and 

Table 3 

Simulated mean, variance and mean squared error for the three 
estimators at the half-normal distribution 



n 









50 


100 


200 


10,000 


n 1/: U S (0) 




Mean 


0.012 


0.058 


0.104 


0.269 


-/(o)} 


Var 


0.320 


0.317 


0.316 


0.296 






MSE 


0.320 


0.320 


0.327 


0.368 






Mean 


0.046 


0.073 


0.091 


0.204 


n 1/3 {/ A (0) 


-/(o)} 


Var 


0.475 


0.406 


0.383 


0.319 






MSE 


0.477 


0.412 


0.391 


0.361 


n 1/3 {£(&n 




Mean 


0.331 


0.336 


0.338 


0.279 


,o)-/(o)} 


Var 


0.659 


0.742 


0.812 


0.714 






MSE 


0.768 


0.855 


0.926 


0.792 



Table 4 

Theoretical limiting mean, variance and mean squared error for the three estimators 



Exponential Half-normal 



Estimator 




Mean 


Variance 


MSE 


Mean 


Variance 


MSE 


™ 1/3 {/n S (0)-/(0)} 




-0.885 


0.805 


1.591 


0.336 


0.316 


0.429 


n 1/s {/n(ciB2in- 1/a ) - 


/(0)| 


-0.298 


1.043 


1.131 











n^ 3 {f^{a n ,Q)-f(Q)} 




-0.349 


1.096 


1.218 











n 2/5 {/„ s ' 2 (0)-/(0)} 




— oo 


oo 


oo 


-0.415 


0.670 


0.842 


n 2/5 {/n(c|B 2 2n- 1 / 5 )- 


/(0)| 


— oo 


■OO 1 


oo 


-0.140 


0.718 


0.737 
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m(n)/n — ► 0, for example, m(n) = [an 2 ^ 3 \. In that case, one can show that 
f n (X m:n ) is asymptotically equivalent to / n (a/(0) _1 ?i -1 / 3 ). Its limiting dis- 
tribution can be obtained from Theorem 3.1 and the AMSE optimal choice 
a* will depend on /(0) and /'(0). For this choice, / n (a*/(0)~ 1 n -1 / 3 ) has the 
same behavior as f n (c*B2in~ 1 ^ 3 ). Another possibility is to estimate /(0) by 
means of a numerical derivative of F n , 

fD (n s _ F n (X m:n ) _ m/n 

in V^l y Y ' 

where m = m(n) as above. It can be shown that n 1 / 3 {/°(0) - /(0)} is asymp- 
totically normal with mean — |/'(0)|o/(2/(0)) and variance /(0) 2 /a. This 
implies that the minimal AMSE is a multiple of (/(0)|/'(0)|) 2 / 3 , which also 
holds for fn(0) and /^(0) [see Theorem 3.1(h) for the case k = 1]. Com- 
puter simulations show that the AMSE of / A (0) is always the smallest of 
the three. 



6. Proofs. 



Proof of Lemma 2.1. Let xq = argmax xg ^: /(x). If x$ = oo, there is 
nothing left to prove; therefore assume that xq < oo. 

(i) By definition of xq and the fact that g is nonincreasing, for x > xq, 
we must have f(x) + g(x) < f(xo) + g(xo). Hence, we must have 

argmax{/(x) + g(x)} < xq = argmax/(x). 

x£K x£K 

This proves (i). 

(ii) If (C + xo,oo) n K = 0, the statement is trivially true, so only con- 
sider the case (C + xo, oo) HK ^ 0. Then by definition f(x) < /(xo), for all 
x G (C + Xo, oo) n K, and by the property of g we also have g(x) < g(xo), 
for x £ (C + xo,oo) n K. This implies f(x) + g(x) < f(xo) + g(xo), for all 
x £ (C + xq, oo) n K. Hence, we must have 

argmax{/(x) + g(x)} < C + xq = C + argmax/(x). 
This proves the lemma. Q 

Proof of Lemma 3.1. Decompose the process as 

Z„i(x,t) = n a ' 2 W n {F{tn- a )) W 1+a)/2 {F(tn" Q ) - f(0)tn~ a } 

(6.1) 

-xt- n a/2 F{tn- a )W n {l) + n a l 2 H n {tn~ a ), 
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where H n (t) = E n (t) - B n (F(t)). By Brownian scaling, n a/2 W n (F(tn- a )) 
has the same distribution as the process W \n a F \tn~ a )) , and by uniform 
continuity of Brownian motion on compacta, 

W(n a F(tn- a )) - W(/(0)t) 0, 

uniformly for t in compact sets. Since a > l/(2k + 1) we have that 

n{1+a)/2{F{tn - a) _ f{0)tn -« } = n d+«)/2^W (tn-^k+i ^ 0] 

[k + 1). 

uniformly for t in compact sets. Because n a / 2 F(tn~ a )W n {\) = O p {n~ a / 2 ), 
together with (2.2) this proves (i). In case (ii), where a = l/(2fc + 1), the 
only difference is the behavior of the deterministic term 

n (fc + l)/( 2 fc + l) {F(fn -l/(2*+ 1 ) ) _ /(0)ta -l/(2fc + l) } ^ f^M_ t k+\ 

uniformly for t in compact sets. Similar to the proof of (i), using Brown- 
ian scaling and uniform continuity of Brownian motion on compacta this 
proves (ii). 

For case (hi) the process Z n2 can be written as 
n b / 2 {W n (F{cn- a +tn~ b )) - W n {F{cn- a ))} 

+ n^ b+1 ^ 2 {F(cn- a + tn~ b ) - F(cn- a ) - / (m^in" 6 } - xt 
- n b / 2 {F{cn~ a + tn~ b ) - F(cn- a )}W n (l) 
+ n b ' 2 H n {cn- a + tn~ b ) - n b l 2 H n {crr a ). 

The process n b ^ 2 {W n (F(cn~ a + tn~ b )) — W n (F(cn~ a ))} has the same dis- 
tribution as the process W(n b (F(cn~ a +tn~ b ) — F(cn~ a ))), and by uniform 
continuity of Brownian motion on compacta, 

W{n b {F{cn- a + tn~ b ) - F(cn" Q ))) - W(f(0)t) -» 0, 

uniformly for t in compact sets. Finally, for some 6\ £ [cn~ a ,cn~ a + tn~ b ] 
and for some 62 £ [0,cn~ a + tn~ b ], it holds that 

n (b+l)/2{ F ( cn -a + tn -bj _ F ( m -«) _ f( cn -<*) tn - b } 

= (l-36)/2 f'(0l) .2 = (1-36V2 f^'fa) ak-l+2 _^ / (fc) (°) k-1.2 

2 2{k-l)\ 1 2(fc- 1)! 

uniformly for t in compact sets. Since 

n b/2 {F{cn~ a + tn~ b ) - F(cn- a )}W n {l) = O p (n~ b/2 ), 
together with (2.2) this proves (hi). □ 

To verify condition (hi) of Theorem 2.1 we need that F{c+t) — F(c) — f{c)t 
is suitably bounded. The next lemma guarantees that this is the case. 
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Lemma 6.1. Suppose that f satisfies (C2). Then there exists a value 
t > 0, such that inf = inf < s < to \f {k) (s)\ > 0. For any < c < to/2 we 
can bound F(c + t) — F(c) — f(c)t by 

00 -fS?* W ' for 0<t< t /2, 

(ii) -'^0f^(t o /2)H, forty to/2, 
( m ) -iS^T^/ 2 )"" 1 * 2 ^ /or-c/2<t<t /2. 
Furthermore, for small enough c and for —c<t<—c/2, 

(iv) F(c + t) — F(c) — f(c)t < —Cic k+l , where C\ > does not depend on 
c and t. 



Proof. The existence of to > follows directly from condition (C2). 
First note that if f^ k \0) ^ 0, then we must have /^(O) < 0, since otherwise 
yC - 1 ) i s increasing in a neighborhood of zero, which implies that /( fc-2 ) 
is increasing in a neighborhood of zero, and so on, which eventually would 
imply that / is increasing in a neighborhood of zero. Therefore, under con- 
dition (C2) we must have /^(O) < 0, which in turn implies that f^ l \s) < 
for < s < to and i = 1,2, ... ,k. Hence, for < t < to/2, the inequality for 
F(c + t) — F{c) — f(c)t is a direct consequence of a Taylor expansion, where 
all negative terms except for the last one are omitted. 

For t > to/ 2, write 

F(c + t)-F(c)-f(c)t 

= F(c + t /2)-F(c)-f(c)to/2 

+ (f(c + 1 /2) -f(c))(t- to/2) 

+ F{c + t)- F{c + to/2) - f{c + t /2)(t - to/2), 

where F(c + t) - F(c + t /2) - f(c + t /2)(t - t /2) < 0, because / is nonin- 
creasing. By the same argument as above, F(c + to/2) — F{c) — f(c)to/2 < 
/( fc )(^)(to/2) fc+1 /(^ + 1)! and f(c + to/2) - /(c) < f( k \6 2 )(t /2) k /k\, for 
some c < 81,82 < c + to/2. This implies that for t > to/2, we can bound 
F(c + t) — F(c) — f(c)t from above by 

" 1 |^ mf|/(fc)| -^ mf|/(fc)|(t - t0/2) 
<-|^mf|/ (fc) |(W2 + t-W2) 

(k + iy. u 1 
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For -c/2 < t < t /2, first write F(c + t) - F(c) - f(c)t = f'(6 4 )t 2 /2, for 
c/2 < 9 4 <c + t /2. By condition (C2), f(6 4 ) = f {k) {0 b )e\- 1 / {k - 1)!, for 
some < #5 < 9 4 . This means that 

F(c + t) - F(c) - f(c)t = 2(£)T/ (fe) (05)i 2 < -|^,inf|/( fc )|t 2 . 

Finally, for -c < t < -c/2, first note that f(c + t) - f(c) > 0, so that F(c + 
t) — F(c) — f(c)t is nondecreasing in t. Write 

F(c + t)-F(c)-f(c)t 

- &A ( C 4. AW _ / W (^7) , +1 _ /W^B) J, 

"(A; + l)! lC+j (A; + 1)! C /c! C ' 

for < 8q < c + 1 and < 67, 9s < c. Because this expression is nondecreas- 
ing for -c < t < -c/2, and since - = o(l), for i = 6,7,8, 
uniformly in — c<t<— c/2, we conclude that 

/W(0) ^ 1 , , fc + 1 



F(c + 1) - F(c) - f(c)t < fj^c^ - 1 + — ) (1 + o(l)) 

as cj.0. Since /W(0) < 0, this proves the lemma. □ 

Proof of Lemma 3.2. (i) Decompose Z„i as in (6.1). Let < e < x 
and define 

X nl (t) = n a / 2 H n (tn- a )-et/2, 
where H n (t) = E n (t) — B n (F(t)). Next, consider the event 

(6.2) A nl = {X nl (s) > X nl (t), for all s,t > 0, such that t-s> S n }. 
Then with 5 n = n _ ^ 1_a ^ 2 (logn) 2 , by using (2.2) we have that 

P{A nl )>p{ sup |fl„(t)| K-n-^ilogn) 2 ]^!. 
Ue[o,oo) 4 J 

Also define the process X n2 (t) = -n a l 2 F{tn- a )W n {l) - et/2, and consider 
the event 

(6.3) A n2 = {X n2 (s) > X n2 (t), for all < s < t < 00}. 

Then, since every sample path of the process X n2 is differentiable, we have 

P(A n2 ) > p{-f(tn- a )W n (l) - £ -n a ' 2 < 0, for all t £ [0,oo)| 1. 
Hence, if A n = A n \ n A n2 , then P(A n ) — ► 1. Since for any 77 > 0, 

p(argmaxZ nl (i)l^ > 7?} < P(A^) 0, 
I te[o,oo) J 
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we conclude that (argmaxfi? n x(£))lA c = Op(l)- This means that we only 
have to consider (argmaxt Z n i(t))lA„- From Lemma 2.1 we have 

(6.4) [argmaxZ n i(i))lU„ < arg max S n i (t) + 5 n , 

V te[o,oo) / te[o,oo) 

where 

S nl (t) = n a/2 W n {F{tn- a )) -(x- e)t + n (1+Q)/2 (F(tn" Q ) - /(0)tn" Q ). 
Since F(tn~ a ) — f(0)tn~ a is nonincreasing for t > 0, according to Lemma 2.1, 

argmaxSniO) < argmax{n o/2 W / : ri (F(tn" a )) - (x - e)t] 
te[o,oo) te[o,oo) 

(6.5) 

< sup{t > : n a/2 W n {F(tn- a )) - (x - e)t > 0}. 
By change of variables u = G(t) = n a F(tn~ a ), and using that for u E [0, n a ], 

(6-6) — y— < G~^{u) < 



/(0) " v ; " f{F-\un-<*)Y 
we find that the right-hand side of (6.5) is bounded by 

G" 1 (sup > : n a l 2 W n {un- a ) - > J 

By Brownian scaling (2.3), 

sup/ u > : n a/2 Wjun- a ) - ^^-u > 1 = .„ sup{u > : - u > 0}, 

I 7(0) J (x-e)^ 

which is of order O p (l). The latter can be seen, for instance, from (3.3). 
Because 5 n = n~^ l ~ a ^ 2 {\ogn) 2 = o{\), together with (6.4), (6.5) and (6.6), 
it follows that 



< argmaxZ ni (t) < argmaxZ ni (t) ) t An + O p (l) 
te[o,oo) V te[o,oo) / 

< °pW + (l) 

which proves (i). 

(ii) In this case a = l/(2k + 1), so that the argument up to (6.4) is the 
same. Let e > and A n = A n \ n ^4 n 2 ; where A n \ is as defined in (6.2) with 
5 n = n~ k /( k+1 \logn) 2 and A n 2 is as defined in (6.3). We now find that 



argmaxZ n i(t) )t An < argmax5 ni (i) + 5 n 

V *e[0,oo) / tG[0,oo) 

(6.7) 

<sup{i >0:S nl (t) >0} + <5 n . 
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Let to be the value from Lemma 6.1 and consider the event 

D nl = {n~ a sup{t > : S nl {t) > 0} < to/2}. 

If S n i(t) > 0, then according to Lemma 6.1(h), for tn~ a > to/2 and n suffi- 
ciently large, we hnd that 

< n a/2 W n (F(tn- a )) - (x - e)t + n^ 1+a ^ 2 {F{t n - a ) - f{Q)tn~ a ) 
<n a ' 2 sup \W n {u)\-{x-e)t-n^l 2 ^^A^\f k \t 

o<m<i [k + iy. 

<« a/2 ™p \W n {u)\-n^l 2 C l t(l + !~5 

0<«<i V a )l l C\. 



<n a ' 2 \ sup {Wn^l-dn^to/A}, 

10<«<1 J 

where C\ = inf \f^\(t /2) k /(k + 1)1. Therefore 

P(D c nl ) <p( sup \W(u)\ > Cm^to/A] - 0. 

\0<u<l / 

This means we can restrict ourselves to the event A n n D n \, so that by 
reasoning analogous to that before, from (6.7) we get 

(argmaxZ n i(t) )l An nD nl < sup{t > : S nl (t) > 0}l Dnl + 5 n 
\ tefo.oo) / 



<sup{0 < t <n a to/2: S nl (t) > 0} +5 n . 



According to Lemma 6. 1 (i) , for < tn a < to/2 and using that a = 1/(2 A; ■ 
1), we get 



i^l 2 {F(tn-) - /(0)tn-«) < -^^T L 



so that 



< ( argmaxZ n i(t) )lA n nD Hl 
te[o,oo) 



(6.8) < sup(o < t < n a t /2 : n a/2 W n {F{tn~ a )) 



Next, distinguish between 

(A) -(x-e)t-ml\f^\t k+1 /(2(k + l)\) > 0, 
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(B) -(x-e)t-inf \f^\t k+1 /{2{k + l)\) < 0. 

Since t > 0, case (A) can only occur when x — e < 0, in which case we have 
<t < (2(Jfe + l)!(e-x)/inf |/ (fe) |) 1/fc , which is of order 0(1). In case (B), 
it follows that 

n^W n {F{tn-*)) - > 0. 

We conclude from (6.8) that 



< argmaxZ n i(t) )t AnnDnl 
V te[o,oo) / 

< supjo < t < n%/2 : n a / 2 W n (F(tn~ Q )) - ^^l ^ 1 > 
(6.9) +O p (l) + 5 n 

< supjt G [0, oo) : n a / 2 W n (F(in" a )) - ^';Q * fc+1 > o} 
+ P (1). 

Similar to the proof of (i), by change of variables u = G(t) = n a F(tn~ a ) and 
using (6.6) with a = l/(2k + 1), we find that the argmax on the right-hand 
side of (6.9) is bounded from above by 

1 (sup{n e [0, oo) : - ^Q^i, > o}) + O p (l). 

By Brownian scaling (2.3), we obtain that the supremum in the first term 
has the same distribution as 

'2{k + 1)1/(0)^ \ 2 /(^ + D , ^ n _ . , , +1 



inf |/( fc )| 



sup{u > : W(u) - u k+1 > 0}. 



Again by using (3.3), this is of order P (1). Similar to the proof of (i), from 
(6.6) and (6.9) we find that 



< argmaxZ„i(t) < argmaxZ n i(t) )lA n nD nl + O p (l) 

te[0,oo) V tG[0,oo) / 

< °pW +0 (1) 

which proves (ii). 

(iii) Decompose Z n 2 as in the proof of Lemma 3.1. Let e > and A n = 
A n \ n A n2 , with A n i defined similarly to (6.2) with 5 n = n~( 1 ~ 6 ) //2 (logn) 2 , 
where b is the same as in Lemma 3.1, and A n <i is defined similarly to (6.3). 
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By the same argument as in the proof of (i) and (ii), it suffices to consider 
(argmax t Z n2 (t))lA n - We find 

I argmax Z n2 (t)\t An < argmax M n2 (t) + 5 n 

\te{-cn b - a ,oo) J te[-cn b - a , oo) 

< sup{t > : M n2 (t) > 0} + S n , 
where M n2 (t) has the same distribution as 

S n2 (t) = n h l 2 W(F(cn- a + tn~ b ) - F{cn- a )) 

+ n^ b+1 ^ 2 (F(cn- a + tn~ b ) - F{cn~ a ) - f(cn- a )tn- b ) 
— (x — e)t. 

As in the proof of (ii), consider D n2 = {n~ 6 sup{t > 0:S n2 (t) > 0} < to/2}, 
where to is the value from Lemma 6.1. By the same reasoning as used in 
the proof of (ii), it again follows from Lemma 6.1(h) that P(D ( f l2 ) — > 0, so 
we only have to consider sup{£ > 0:S n2 (t) > 0}i£> n2 . Hence, similar to the 
proof of (ii) we get 

sup{t > : S n2 {t) > 0}t Dn2 < sup{0 < t < n%/2 : S n2 {t) > 0}. 

Since b > l/(2fc + 1), for k > 2, we cannot proceed as in the proof of (ii) 
by using Lemma 6. 1 (i) to bound the drift term. However, according to 
Lemma 6.1(iii), for < t < n%/2, 

n (b+l)/2, F ( cn -a + tn -bj _ F ( cn -«) _ f(cn-<*)tn- b ) < - ™!} fk \, t 2 , 

2 lt (k — iy. 

so that sup{0 < t < n b to/2 : S n2 (t) > 0} is bounded from above by 
sup jo < t < n%/2 : n h l 2 W {F(cn~ a + tn~ b ) - F{cn~ a )) 

inf|/ (fc) | 2 
- (x - e)t - -r-^ — l -t 2 > 

Similarly to (6.9), we conclude that sup{t > : S n2 (t) > 0}1d u2 is bounded 
from above by 

sup(i > : n b / 2 W n (F(cn- a + tn~ b ) - F(cn" a )) 



(6.10) 



2 fc + 1 (fc-l)! 

Next, change variables u = G(t) = n b (F(cn~ a + tn~ b ) — F{cn~ a )). Then for 
any u G [0,n 6 (l - F(cn~ a ))], it follows that 

(6-11) -^-<G~ l (u)< 



/(0) ~ K ' ~ f(F~ l (un~ b + F(cn~ a ))) 
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so that (6.10) is bounded from above by 

G- 1 (supjn > : n^Wtun-O) - ^[^^ > o}) + O p (l). 

As in the proof of (ii), by Brownian scaling (2.3) together with (6.11), we 
find that 

argmax Z n2 (t) < I argmax Z n2 (t))t AnnDn2 + O p (l) 

te[-cn b - a ,oo) \te[~cn b - a , oo) / 

f{F-\O p (n- h ) + F(cn-«))) 



(6-12) < JTE^ 7Z$k ■ ^„-^ +Op(l) 



= O p (l). 

To obtain a lower bound for the left-hand side of (6.12), first note that 
(6.13) argmax Z n 2(t) > argmax Z n 2{t) = — argmax Z n 2(—t). 

te[-cn b - a ,oo) t<=[-cn b - a fi] te[0,cn b - a ] 

From here the argument runs along the same lines as for the upper bound. 
Let e > and, similarly to (6.2) and (6.3), define the events A n i and A n 2 
with 

X nl (t) = n b/2 H n (cn- a - tn~ b ) - et/2, 
X n2 {t) = -n b / 2 F{cn- a - tn- b )W n (l) - £t/2. 
With A n = A n \ n A n 2, as before we get (argmax^ Z n 2(— t))t. c A = O p (l) and 

argmaxZ n2 (-t) J1a„ < argmax M n3 (t) + S n , 

t / t£[0,cn b - a ) 

where M n3 {t) has the same distribution as 

S n3 (t) = n b/2 W(F(cn- a - in" 6 ) - F(cn- a )) 

+ n^ b+1 ^ 2 (F(cn- a - tn~ b ) - F{cn- a ) + f(cn- a )t n - b ) 
+ (x + e)t 
< n b/2 sup{| W(u)\ : < u < /(0)cn" a } 

+ n {b+1 ^ 2 {F(cn- a - tn~ b ) - F{cn~ a ) + f(cnT a )tnT b ) 
+ {x + £)t. 

Consider D n3 = {n- fc sup{0 < t < cn b - a : S n3 (t) > 0} < cn" Q /2}, and note 
that by Brownian scaling sup{|VF(u)| :0 < u < f(0)cn~ a } has the same dis- 
tribution as n~ a / 2 sup{|PF(u)| : < u < c/(0)}. Reasoning as in the proof of 
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(ii), using Lemma 6.1(iv), we obtain that for cn~ a /2 < n~ b t < cn~ a and n 
sufficiently large, 

< n (6- Q )/2 gup 

0<«<c/(0) 

+ n( b+1) / 2 (F{cn- a - tn~ b ) - F{cn~ a ) + /(cra^n" 6 ) + (x + e)i 

< n (6-<W gup ^(^i 
\0<u<c/(0) 

- Cl n^ 2k+l ^/ 2 (l + — 



lfl (6+l)/3-(fc+l)a 

< n^' 2 ( sup |W(«)| - 9l n (^k+i) a )/2 
\o<m<c/(o) 2 

Therefore, P(D^ 3 ) — > 0, so we only have to consider (argmax t S n 3(t))±£) n3 . 
Hence, similar to the proof of (ii), we get 

argmax S n3 (t)t Dn3 + 5 n < sup{0 < t < cn b ~ a /2 : S n3 (t) > 0} + 5 n . 
te[o,cn b - a ) 

According to Lemma 6.1(iii), for < tn~ b < cn~ a /2 we have 

n <H-W(F(cn- a - tn~ b ) - F(cn~ a ) + f(cn~ a )tn- b ) 

(6.14) 

< inf|/W| 2 
~ 2 fe (/fc-l)! ' 

Similar to (ii), separate cases and obtain that argmax te r m 6-a) S n s(t)tD n3 + 
5 n is bounded from above by 

supjo < t < cn b ~ a /2 : n b/2 W(F(cn- a - tn~ b ) - F(cn" a )) 

After change of variables u = G(t) = n b (F(cn~ a — tn~ b ) — F(cn~ a )), and 
using that u 6 [— n b F(cn~ a ),0], one has 

We now find that 

argmax S n3 (t) + 5 n 

t&[0,cn b - a ) 
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As above, by Brownian scaling (2.3) together with (6.13), it follows that 

argmax Z n2 (t) > + O p {\) = O p {\). 

te[-cn b - a ,oo) J yen ) 

Together with (6.12) this proves the lemma. □ 
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